A Multiplatform Chemometric Approach to Modeling of Mosquito Repellents

177

variables to a smaller set of uncorrelated components. Then least squares regression is

performed on these new components, instead of the original data. ULR, as the simplest

method, is usually the first step in the regression modeling.

The linear modeling can be illustrated on the example of the QSAR modeling of repel-

lence index (Rindex) of the set containing several natural compounds (carvacrol, thymol,

cuminic acid, n-butyl cinnamate, ethyl cinnamate, benzyl benzoate, lauric acid) and some

newly synthesized compounds (Syn1 – KO5, Syn2 – KO10, Syn3 – KO2, Syn4 – KO3,

Syn5 – KO9, Syn6 – KO6, Syn7 – KO4, Syn8 – KO7, Syn9 – KO11, Syn10 – KO13,

Syn11 – KO12, Syn12 – KO8, Syn13 – KO16) (Thireou et al. 2018). The repellence in-

dices toward A. gambiae females of these compound was published in the study by Thireou

et al. 2018. The simplest model is the ULR model. It correlates Rindex with boiling point

(BP) of the compounds.

ULR : Rindex = 252.8713(±38.36695)0.3041248(±0.05775952)BP

(9.1)

The MLR models correlate Rindex of the same group of the compounds with more than

one molecular descriptor. The MLR1 model predicts the Rindex based on critical pressure

(CP) and calculated molar refractivity (CMR):

MLR1 : Rindex

=

404.7651(±56.47803)4.499053(±1.119779)CP

36.69531(±5.468419)CMR

(9.2)

This model can be presented as 3D surface plot as it is given in Figure 9.3 so it can be

easily noticed what values of CP and CMR a compound should have to express desirable

Rindex.

The MLR2 model presents the relationship between Rindex and three independent vari-

ables: BP, total polar surface area (tPSA) and calculated lipophilicity descriptor (ClogP):

MLR2 : Rindex

=

248.8165(±37.2906)0.480915(±0.06351143)BP+

(9.3)

+

1.853831(±0.4399472)tPSA+16.48189(±5.434551)ClogP

This model implies the significance of three molecular features that affect the repellence

ability of the studied series of compounds. Considering the highest value of the regression

coefficient in this model, the lipophilicity parameter (ClogP) has the greatest influence on

Rindex. The selection of the most suitable descriptors for the MLR models was carried out

by NCSS 2007 program by all possible regression routine from the set of the descriptors

that contained boiling point (BP), melting point (MP), critical temperature (CT), critical

pressure (CP), critical volume (CV), Gibbs energy (GE), lipophilicity (logP), molar refrac-

tivity (MR), total polar surface area (tPSA), calculated lipophilicity descriptor (ClogP) and

calculated molar refractivity (CMR). All the descriptors were calculated by ChemBioDraw

Ultra 13.0 program (PerkinElmer Inc.).

Although mathematically the simplest and easiest to interpret, ULR models often can-

not fully describe the dependence of the biological response on the molecular structure,